Add gh-aw rerun review scanner#35685
Conversation
Split the monolithic 'Run PR Reviewer Agent' bash task into 4 sequential tasks, each with exactly the env vars it needs: Task 1 (Setup): GH_TOKEN only — branch checkout, PR merge Task 2 (Gate): NO tokens — dotnet build/test, gate verification Task 3 (CopilotReview): COPILOT_GITHUB_TOKEN — expert review + try-fix Task 4 (Post): GH_TOKEN only — comments, labels, summary Review-PR.ps1 gains -Phase (Setup|Gate|CopilotReview|Post) and -TrustedScriptsDir parameters so each pipeline task invokes a single phase. Backward-compatible: omitting -Phase runs all steps sequentially. Security improvements: - persistCredentials: false (credentials no longer available to all tasks) - Removed gh auth login step (GH_TOKEN used directly as env var) - --secret-env-vars strips tokens from copilot subprocess environments - Trusted scripts copied once in Setup, reused by all phases - PRNumber type changed to 'number' for AzDO parameter validation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Persist regression risks, tests, and platform to files in Gate phase - Restore regression data + detect script path in CopilotReview phase - Fix stale RunReview references in comments (now RunGate/RunPost) - Fix misleading RunPost step name comment in ci-copilot.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Tier 3 AI refresh in CopilotReview phase emits detectedCategories under step RunReview, but downstream RunDeepUITests was only reading RunGate. Use coalesce() so AI-refreshed categories are preferred when available, falling back to Gate-detected categories otherwise. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add TrustedScriptsDir null guard with local fallback in both CopilotReview and Post phase restoration blocks (prevents ParameterBindingException when running locally with -Phase) - Add setup-complete sentinel verification before Gate/CopilotReview/Post phases to fail fast with clear error if Setup didn't complete Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The detection script (Detect-TestsInDiff.ps1) fetches PR metadata and labels from the GitHub API. Without GH_TOKEN, these calls are unauthenticated and subject to low rate limits. This adds the token for reliable API access. The token is GH_COMMENT_TOKEN (same as Setup/Post phases). The security boundary is preserved — only CopilotReview (Task 3) lacks GH_TOKEN to prevent the Copilot agent from posting directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three small hardening changes uncovered while auditing PR #35324: 1. RunDeepUITests checkout: add persistCredentials: false. This stage merges the PR head and runs PR-modified scripts (BuildAndRunHostApp.ps1, Invoke-UITestWithRetry.ps1) — without this, malicious PR code could read the GitHub App auth header from .git/config. 2. UpdateAISummaryComment checkout: add persistCredentials: false. Defense-in-depth — this stage runs with GH_COMMENT_TOKEN in env. 3. Setup task: chmod -R a-w on the trusted-github dir after copy, so the Copilot agent in Task 3 cannot tamper with the scripts that Task 4 will execute with GH_TOKEN. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses two [critical] security findings from MauiBot's 2026-05-24 review of PR #35324 (#35324): 1. eng/scripts/detect-ui-test-categories.ps1 was being invoked from the PR-merged worktree ($RepoRoot/eng/scripts/...) during the Gate task, which has GH_TOKEN=$(GH_COMMENT_TOKEN) in env. A PR could replace that one file to exfiltrate the maui-bot posting token. Fix: copy eng/scripts into the trusted directory alongside .github/scripts and .github/skills, add $EngScriptsDir resolution in Review-PR.ps1, and route the two $detectScript invocations through it. Same root cause also applied to $uiTestRunner / $deviceTestRunner (now use $ScriptsDir and $SkillsDir instead of $RepoRoot/.github/...). 2. dotnet test, BuildAndRunHostApp.ps1, Run-DeviceTests.ps1, and verify-tests-fail.ps1 all execute PR-controlled code (MSBuild targets, source generators, analyzers, test code, host-app builds). Any of these could read $env:GH_TOKEN via <Exec EnvironmentVariables=...> in a .csproj or Directory.Build.targets and POST it. Fix: introduce Invoke-WithoutGhTokens helper that clears GH_TOKEN / GITHUB_TOKEN / COPILOT_GITHUB_TOKEN for the duration of a scriptblock, then restores them. Wrap every Gate-phase invocation of PR-controlled code. Trusted metadata-fetch scripts (Detect-TestsInDiff, Find-RegressionRisks, detect-ui-test-categories) still see GH_TOKEN -- they need it for `gh` CLI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Distills the threat model + 8 prevention rules from PR #35324 (and the MauiBot findings + T-Gro audit that surfaced them) into a single .github/instructions file that auto-applies whenever anyone edits any script/yaml/markdown belonging to the Copilot PR-review pipeline. Rules covered: 1. Per-task token scoping (AzDO env: block) 2. persistCredentials: false on every checkout: self 3. Trusted-script copy + chmod -R a-w before PR merge 4. Strip GH_TOKEN/GITHUB_TOKEN/COPILOT_GITHUB_TOKEN from env before invoking PR-controlled code (dotnet test, MSBuild, host-app, etc.) 5. Cross-phase signal files in Agent.TempDirectory, never working tree 6. Strip ##vso[...] from PR-controlled stdout (with CR handling) 7. gh-aw version pinning, .lock.yml regeneration, trusted .github/ restore on workflow_dispatch 8. No tokens via pipeline variables / log lines Includes a code-review checklist and grep anti-pattern scans so future contributors (human or agent) editing any of ~25 files in this surface get the security context automatically via VS Code Copilot applyTo. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…to feature/refactor-copilot-yml
Trim 242 -> ~60 lines: drop redundant good/bad code pairs (the rule itself is the lesson), drop threat-model table (merged into intro), drop references section. Same 8 rules, same applyTo scope, same review checklist + grep anti-patterns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Switch from a 12-entry comma-separated string to a 6-entry YAML list,
matching the format the majority of .github/instructions files already
use. Uses minimatch brace expansion ({pr-review,verify-tests-fail-...})
and extension wildcards (workflows/*.{md,yml,lock.yml}) to drop 6
hardcoded paths. Validated to match 70 files across the surface
(ci-copilot.yml + 41 scripts + 19 skill files + 4 phase docs + 4
workflows + 1 detector).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two changes:
1. Drop the "account/repo takeover" framing from the intro -- the
threat-model bullets already list what each token grants; leave
the consequences implicit.
2. Switch applyTo from YAML list + brace expansion to the single
comma-separated string format documented by GitHub:
https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions
The official docs explicitly support multiple patterns via a
comma-separated string (example: applyTo: "**/*.ts,**/*.tsx").
YAML list form and {a,b,c} brace expansion are NOT documented.
The web-side parser (Copilot coding agent + code review on
github.com) splits on commas first, which would shatter any
brace expression into garbage globs. Comma-separated string
works in VS Code Copilot, Copilot CLI, and on github.com.
Kept the .* extension wildcard for copilot-evaluate-tests.*
(standard glob, covers .md + .lock.yml). Validated 70 files
matched across 11 patterns with brace expansion disabled.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous commit wrapped the WHOLE verify-tests-fail.ps1 invocation in Invoke-WithoutGhTokens at the Review-PR.ps1 level. That broke the Gate because verify-tests-fail.ps1 itself needs GH_TOKEN to call Detect-TestsInDiff.ps1, which uses `gh api repos/.../pulls/N/files` to enumerate PR files for test-type detection. Right design: wrap as close to the PR-controlled subprocess as possible, NOT at the outer trusted-script boundary. A trusted script may need `gh` itself for metadata. Changes: - verify-tests-fail.ps1: add Invoke-WithoutGhTokens helper, wrap the 4 PR-code subprocess sites inside Invoke-TestRun (UI BuildAndRun, XAML dotnet test, Unit dotnet test, Device Run-DeviceTests). - Review-PR.ps1: unwrap the outer pwsh -File $verifyScript call. Add comment explaining why this one is intentionally not wrapped. - ci-copilot-pipeline-security.instructions.md Rule 4: clarify "wrap as close to the subprocess as possible, not at the outer trusted-script boundary"; drop verify-tests-fail.ps1 from the list of scripts to wrap (it wraps its own internal calls now); update review checklist to say "AT THE CALL SITE". Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Step 3 (Run Detected UI Tests) was running BuildAndRunHostApp.ps1 per detected category inside the ReviewPR stage, duplicating the same work that RunDeepUITests (Stage 2) does. This caused UI tests to run twice. Now the ReviewPR stage only runs targeted PR-specific tests via the Gate (verify-tests-fail.ps1), and full-category runs happen exclusively in the RunDeepUITests stage. Renumbered steps: old 4→3, 5→4, 6→5, 7→6, 8→7. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Detect category-wide fixture setup failures in deep UI TRX aggregation and render them as setup failures instead of duplicated failed tests. Keep the deep UI task successful so the summary comment stage can publish the artifact-backed result. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Skip rendering UI test and regression cross-reference sections when their content only reports no actionable work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Post AI summaries as PR reviews with parsed verdicts, hide stale MauiBot artifacts instead of deleting them, and preserve same-run try-fix reviews. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the unified AI Review Summary layout with segmented status chips, collapsed review sessions, and a Future Action section for alternative fix guidance. Avoid posting a separate try-fix review so the AI summary is the single source of truth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep AI Summary as the single current MauiBot review artifact, ensure non-PR try-fix winners request changes through that review, and prevent stale try-fix cleanup from hiding the merged summary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the built Windows device-test app directly for gate validation so the runner avoids testhost dependency crashes and matches the canonical windows.cake path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Teach /review rerun to run a deterministic activity check for new comments or commits and apply s/agent-ready-for-rerun when another AI review is justified. Also add rerun guidance to generated AI Summary comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Treat label application as successful when the ready-for-rerun label is present after the GitHub API call, avoiding false failures from brittle gh exit-code handling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Post label additions with the GitHub Issues API JSON payload shape so /review rerun can reliably apply s/agent-ready-for-rerun from GitHub Actions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Grant pull-requests: write to the /review rerun labeling job so it can apply s/agent-ready-for-rerun to pull requests after deterministic eligibility passes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Stop preserving PR finalization sections in MauiBot AI Summary updates and update docs so pr-finalize is no longer described as part of the automated Review-PR process. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Generate a deterministic rerun context artifact listing new comments and commits since the latest AI Summary or previous /review rerun checkpoint, and instruct pre-flight to read it before reviewing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep the /review rerun instruction in the AI Summary text, but remove the command implementation, rerun context generation, ready-for-rerun label changes, and related tests from this PR so they can live in the dedicated scanner PR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Teach /review rerun to run a deterministic activity check for new comments or commits and apply s/agent-ready-for-rerun when another AI review is justified. Also add rerun guidance to generated AI Summary comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pass Owner and Repo through when dot-sourcing Resolve-RerunEligibility so Query-RerunReadyPRs does not reset to dotnet/maui and can scan fork/test repositories correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Parse the latest normal /review command and carry its platform and pipeline branch through rerun scanner candidates and safe-output triggering so /review rerun follows the same target rules as the original review request. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Require /review rerun commenters to have OWNER, MEMBER, COLLABORATOR, or CONTRIBUTOR association before applying the rerun-ready label. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen
left a comment
There was a problem hiding this comment.
Adversarial PR Review — 3 reviewers, consensus-validated
3 independent reviewers (different model families) analyzed this PR in parallel without seeing each other's findings or this PR's existing review threads. Findings below survived adversarial consensus (3/3 unanimous, 2/3 corroborated, or 1/3 with direct source-code verification).
Verdict
Concerns before merge. Three blocking issues plus a 2/3-consensus race condition that directly corroborates @T-Gro's unresolved concurrency concern.
Blocking (❌)
| # | Issue | Consensus |
|---|---|---|
| 1 | /review rerun bypasses the actor permission check that /review enforces |
3/3 unanimous |
| 2 | Scanner aborts the entire batch on a skip with missing rerun_comment_id — which the prompt itself instructs the agent to send as 0 |
1/3 + verified |
| 3 | Manual /review never removes s/agent-ready-for-rerun, causing a duplicate review on the next hourly scanner run |
1/3 + verified |
Should fix (⚠️ )
| # | Issue | Consensus |
|---|---|---|
| 4 | Label-based "lock" is non-atomic check-then-set, and the scanner and /review workflows have no shared concurrency group — directly corroborates @T-Gro's concern |
2/3 |
| 5 | /review --branch refs/heads/foo produces refs/heads/refs/heads/foo and breaks the AzDO trigger |
1/3 + verified |
| 6 | --platform=ios (equals form) is silently ignored — user thinks iOS was selected; pipeline runs on Android |
1/3 + verified |
| 7 | Editing the AI Summary advances the rerun checkpoint via updated_at, silently dropping eligible reruns |
1/3 |
| 8 | +1 reaction posted before AzDO trigger; on failure, the misleading +1 stays as a "review started" signal |
1/3 + verified |
(Inline comments below have file:line context, repro scenarios, and fix suggestions.)
Design-level observations (not posted as inline)
expected_head_shaempty-string bypass (Invoke-RerunReviewTrigger.ps1:181) —if ($expectedHeadSha -and …)short-circuits on empty string, skipping the head-SHA staleness check. gh-aw'srequired: trueenforces presence, not non-emptiness; treating empty as invalid is cheap defense-in-depth.- Stale-lock recovery direction (
Test-AgentReviewInProgressIsStale) — on events-API error or missinglabeledevent, returns$false(fresh). If the labeling event ages out or is unreadable, a lock that was never cleared can wedge a PR indefinitely. Falling back to issueupdated_atwould let the 18-hour window still elapse. - Heredoc delimiter
EOF(rerun-review-scanner.md:114-116) — un-randomized. Practically near-impossible to trigger via JSON-encoded content, butEOF_$(openssl rand -hex 8)is the documented hardening. - Test coverage gap —
Invoke-RerunReviewTrigger.ps1andQuery-RerunReadyPRs.ps1have no Pester tests. Issues #2, #5, #6, #8 above would have been caught by basic argument-validation andNormalize-PipelineRefcoverage.
Prior review threads (acknowledgement)
- @T-Gro asked twice (2026-06-01 and again on 2026-06-02 against
b84cc7f3f): "What is your concurrency strategy? What prevents 2+ reviews for the same PR?" — Finding #4 above (the 2/3-consensus race) is the same issue, now with a concrete trigger sequence: scanner-vs-/reviewworkflows have no shared concurrency group andSet-AgentReviewInProgressis check-then-set. Hourly serialization only protects scanner-vs-scanner. - @JanKrivanek asked where
trigger_rerun_reviewis defined. It is generated by gh-aw from thesafe-outputs.jobs.trigger-rerun-review:block in.github/workflows/rerun-review-scanner.md. The hyphenated job key compiles into an underscored tool name (trigger_rerun_review) for the agent. See compiled lock.yml around line 469.
Methodology
3 reviewers with different model families ran in parallel:
- Reviewer 1: deep-reasoning model, focused on subtle logic and concurrency walkthrough
- Reviewer 2: pattern-matching model, focused on security and common bug classes
- Reviewer 3: cross-family model, focused on edge cases
Disputed (1/3) findings were either verified directly against source code or discarded. Findings about issues already raised in existing review threads were merged with attribution rather than re-flagged. CI build/test status was intentionally out of scope.
| group: review-rerun-${{ github.event.issue.number }} | ||
| cancel-in-progress: false | ||
| timeout-minutes: 5 | ||
| permissions: |
There was a problem hiding this comment.
❌ Security — mark-rerun-ready lacks the actor permission check that trigger-review enforces (lines 127-138 in this same file). Any commenter — including a first-time contributor on their own PR — can post /review rerun and successfully apply s/agent-ready-for-rerun, since their own comment is fresh evidence past the AI Summary checkpoint. The hourly scanner then picks it up and triggers a full AzDO review. This is a privilege escalation: the direct-trigger path is gated; the rerun path is not.
Flagged by: 3/3 reviewers (unanimous)
Fix: Add the same Check actor permission step (write/maintain/admin) to mark-rerun-ready before invoking Resolve-RerunEligibility.ps1 -ApplyLabel.
| if ($prNumber -le 0) { | ||
| throw "Invalid PR number '$($item.pr_number)'." | ||
| } | ||
| if ($rerunCommentId -le 0) { |
There was a problem hiding this comment.
❌ Logic — This validation runs before branching on $decision. But the scanner prompt explicitly tells the agent: "If it is missing, choose skip and use 0." (rerun-review-scanner.md:157). And Query-RerunReadyPRs.ps1:99 already produces rerunCommentId = $null (→ 0 after [Int64] cast) whenever a PR was queued without a /review rerun comment (e.g. a maintainer applied the queue label manually).
So a valid skip with 0 always throws here. Worse: the foreach at line 158 has no per-item try/catch, so $ErrorActionPreference = 'Stop' propagates the throw and aborts processing of every remaining candidate in the batch — one bad item silently drops the rest.
Flagged by: 1/3 reviewer; verified directly against source.
Fix: Only require rerun_comment_id > 0 on the trigger branch; allow skip with 0 (and guard Add-CommentReaction on >0). Wrap the per-candidate body in try/catch { Write-Error; continue } so one malformed item doesn't kill the whole batch.
| id: review_lock | ||
| shell: pwsh | ||
| env: | ||
| GH_TOKEN: ${{ github.token }} |
There was a problem hiding this comment.
❌ State Bug — trigger-review acquires s/agent-review-in-progress here, but the job never removes s/agent-ready-for-rerun if it's present.
Scenario: PR has s/agent-ready-for-rerun (queued). Maintainer types /review directly. trigger-review sets the in-progress lock, runs AzDO, AzDO's CleanupReviewLock stage removes the in-progress label. The queue label is still there. Next hourly scanner pass sees a queued PR with no in-progress lock → triggers a second review.
Flagged by: 1/3 reviewer; verified directly against source.
Fix: After successful Set-AgentReviewInProgress, also remove s/agent-ready-for-rerun (best-effort with --silent is fine — it may not be present).
| # ============================================================ | ||
| # Set-AgentReviewInProgress | ||
| # ============================================================ | ||
| function Set-AgentReviewInProgress { |
There was a problem hiding this comment.
Set-AgentReviewInProgress is check-then-set: Get-AgentLabels → if (-not contains) → Add-Label → Get-AgentLabels (verify). GitHub's add-label API is idempotent — two concurrent callers will both pass the -notcontains check and both call Add-Label, and both success returns will report "the label is present".
This matters because the scanner workflow uses concurrency: gh-aw-${{ github.workflow }} (scanner-vs-scanner only) and review-trigger.yml uses concurrency: review-trigger-${{ github.event.issue.number }} (per-PR direct trigger only). These are different groups in different workflows, so the safe-output job and the /review job can race on the same PR with no serialization. This is exactly the unresolved concern @T-Gro raised twice on this PR.
Flagged by: 2/3 reviewers (lowered to
Fix: Either (a) route both workflows through the same concurrency group keyed by PR number (e.g. pr-review-${{ pr_number }}), or (b) make acquisition tag-based — write the label with a unique run-id marker (e.g. as a comment or via a label-prefix the script stamps with its own run-id), then re-read and abort if the marker doesn't match this run.
| resources = @{ | ||
| repositories = @{ | ||
| self = @{ | ||
| refName = "refs/heads/$PipelineRef" |
There was a problem hiding this comment.
refName = "refs/heads/$PipelineRef" blindly prepends refs/heads/ even when $PipelineRef already starts with it. Normalize-PipelineRef strips most metacharacters but allows /, so refs/heads/main survives unchanged → final refName is refs/heads/refs/heads/main, which AzDO rejects.
User scenario: someone types /review --branch refs/heads/main (a perfectly reasonable expectation given the rest of .github/ uses refs/heads/...). The pipeline trigger fails, the catch block re-throws, the entire scanner batch aborts (see comment on line 171). Same hazard exists in the bash version of this prepend at review-trigger.yml:369.
Flagged by: 1/3 reviewer; verified directly against source.
Fix: In Normalize-PipelineRef (and the equivalent block in Resolve-RerunEligibility.ps1 and review-trigger.yml), strip a leading refs/heads/ before sanitization.
| } | ||
| continue | ||
| } | ||
| '^(--platform|-p)$' { |
There was a problem hiding this comment.
^(--platform|-p)$ only matches the flag in its bare form. /review --platform=ios (with an equals sign — the form most CLIs accept) does NOT match. Falls through to the default case which checks $candidate = $token.ToLowerInvariant() against the valid platform list — --platform=ios is not a valid platform name, so it's silently dropped and $platform stays empty.
User thinks they selected iOS; the pipeline runs on android (the fallback in Get-PlatformFromLabels).
Same hazard exists in review-trigger.yml's bash parser (lines 170-200) — it also only handles the space-separated form.
Flagged by: 1/3 reviewer; verified directly against source.
Fix: Pre-process tokens to split on = before iterating ($tokens = $tokens | ForEach-Object { $_ -split '=', 2 }), or extend the regex: ^(--platform|-p)(=(.+))?$ and capture group 3.
|
|
||
| return @($Comments | | ||
| Where-Object { $_.body -and ([string]$_.body).Contains($AISummaryMarker) } | | ||
| Sort-Object @{ Expression = { Get-ObjectDate $_ 'updated_at' }; Descending = $true }, @{ Expression = { [Int64]$_.id }; Descending = $true } | |
There was a problem hiding this comment.
Get-LatestAISummaryComment sorts by updated_at, and downstream summaryUpdatedAt is used as the eligibility checkpoint (lines 353, 490). If a maintainer edits the AI Summary comment after new reviewer comments/commits have landed (typo fix, formatting), updated_at jumps ahead of that real activity → Test-HasEvidenceCommentAfter and Test-HasCommitAfter see nothing past the checkpoint → eligibility returns no-new-comments-or-commits and a legitimate /review rerun is silently dropped.
Flagged by: 1/3 reviewer.
Fix: Anchor the checkpoint to the comment's created_at rather than updated_at, or (better) to the SHA embedded in the SESSION marker via Get-LatestReviewedSha, which is the actual state the previous review observed.
| } | ||
| } | ||
|
|
||
| Add-CommentReaction -CommentId $rerunCommentId -Content '+1' |
There was a problem hiding this comment.
+1 reaction is posted before Invoke-AzDOReviewPipeline. If the AzDO trigger throws, the catch block at line 221 clears the in-progress lock but does NOT remove the +1 reaction or restore the queue label. The user sees a +1 (meaning "your rerun is starting") but no review actually started. Combined with the loop-abort issue at line 171, the same throw also kills processing of every later candidate in the batch.
Flagged by: 1/3 reviewer; verified directly against source.
Fix: Post the +1 reaction only after Invoke-AzDOReviewPipeline returns successfully. On failure, either remove the reaction, post a -1, or leave the queue label in place so the next scanner pass retries.
|
Thanks for the review pass. I applied the suggestions that looked correct and left one concurrency suggestion intentionally narrower:
For the scanner-vs-manual concurrency concern, I did not switch to a single global concurrency group. GitHub Actions concurrency only keeps one pending run per group, so a global group could silently cancel unrelated maintainer |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…scanner # Conflicts: # .github/scripts/Post-AISummaryComment.Tests.ps1 # .github/scripts/Review-PR.ps1 # .github/scripts/post-ai-summary-comment.ps1 # .github/scripts/shared/Remove-StaleMauiBotComments.ps1 # eng/pipelines/ci-copilot.yml
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35685Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35685" |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Skill Validation Results✅ Static Checks PassedSkills checked: 18 | Agents checked: 4 Full validator output❌ LLM Evaluation Failed0/1 skill(s) passed validation
❌ code-review: Improvement score -7.2% below threshold of 10.0% (not statistically significant) [high variance in: Happy path - code review PR request, Independence-first - agent reads diff before description, Anti-pattern - never approve via GitHub API, Verdict consistency - errors must map to NEEDS_CHANGES] [NOT ACTIVATED (isolated) in 2 scenario(s)] [NOT ACTIVATED (plugin) in 2 scenario(s)] [1] High run-to-run variance (CV=1.40) 📊 code-review / anti-pattern-never-approve-via-github-apiJudge Report: Anti-pattern - never approve via GitHub APIBaseline JudgeOverall Score: 2/5
With-Skill Judge (Isolated)Overall Score: 2.7/5
With-Skill Judge (Plugin)Overall Score: 3/5
Baseline Agent Output
Could you either:
Also, important note: I cannot approve PRs — per repository policy, approval is a human decision. I can perform the code review and post comments with my findings, but the final approval must come from you or another maintainer. With-Skill Agent Output (Isolated)
Additionally, per repository policy, I cannot use Would you like me to:
With-Skill Agent Output (Plugin)
Code Review — PR #28713Independent AssessmentWhat this changes: Fixes iOS Shell flyout header/footer resize behavior by implementing Key Findings💡 Suggestion — PR is a "kitchen sink" mergeThis PR bundles at least 4 independent concerns: iOS flyout header resize, Android FlyoutPage memory leak fix, source gen escaping bug, and whitespace cleanup. These would ideally be separate PRs for clarity. ✅ iOS Fix looks correctThe ✅ Android Memory Leak FixThe ✅ Source Gen Escaping
Verdict: LGTM (already merged)Confidence: high 📊 code-review / happy-path-code-review-pr-requestJudge Report: Happy path - code review PR requestBaseline JudgeOverall Score: 3/5
With-Skill Judge (Isolated)Overall Score: 3/5
With-Skill Judge (Plugin)Overall Score: 2/5
Baseline Agent Output
Code Review: PR #34024 — [iOS] MauiView: Round SafeArea insets to pixels to fix infinite layout cycleStatus: Closed/Merged | Files: 17 changed (+1,921 additions) 🟢 Strengths
🟡 Issues / Concerns1.
// MauiView: checks if parent handles the SAME edges The ScrollView version doesn't do edge-aware checking. If a parent handles only TOP, a 2.
3. Excessive Files like 4. Adding dialog-close semantics to a shared 5. Missing
🔴 Potential Bug
// After: The semantics of SummaryThe core fix (pixel-level comparison + parent deduplication) is well-designed and addresses a real infinite layout cycle. The test coverage is thorough. Main concerns are the inconsistency between With-Skill Agent Output (Isolated)
To perform a code review of PR #34024 in
Without API access, I can't fetch the PR diff, source files, or CI status needed for a thorough review. With-Skill Agent Output (Plugin)
To proceed, I need one of:
Would you like to provide authentication or share the PR content directly? 📊 code-review / independence-first-agent-reads-diff-before-descriptionJudge Report: Independence-first - agent reads diff before descriptionBaseline JudgeOverall Score: 3/5
With-Skill Judge (Isolated)Overall Score: 2.7/5
With-Skill Judge (Plugin)Overall Score: 3/5
Baseline Agent Output
Code Review: PR #31202 — "LineHeight and decorations for HTML Label - fix"PR Status: Merged (April 2, 2026) | Author: @kubaflo | Fixes: #22193, #22197 SummaryThe core fix changes the guard condition in The iOS handler is also updated to refresh Findings✅ Correct Logic Change (No Issues)The old ✅ iOS Handler Update is CorrectAdding
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sanitize agent-provided log text, verify safe-output decisions against the deterministic candidate set, and only honor /review options from users with write permissions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Source pipeline_ref / platform / rerun_comment_id from the candidate record instead of the agent's emission. The candidates.json file is the deterministic source of truth for everything the handler acts on; the agent's matching fields are now advisory only. - Validate pr_number shape against ^[1-9]\d*$ before the [int] cast so malformed values produce a tidy error message instead of a .NET conversion exception that echoes the raw input. - Drop raw $item values from the validation throw messages and route the outer-catch $_ plus the inline ::warning:: paths through ConvertTo-SafeLogValue so error output is shaped consistently. - When the rate limiter blocks a trigger, leave the s/agent-ready-for-rerun label in place (and skip the -1 reaction) so the next hourly scan can retry instead of permanently dropping the request. - In Query-RerunReadyPRs.ps1, gate /review option authority on the comment's author_association field directly (matches the downstream fallback in Test-ReviewCommandOptionsAllowed). The previous /collaborators/.../permission lookup needed push scope that the read-only scanner token does not have, so it always returned false and silently discarded every /review option. - Tighten Get-LatestAISummaryComment so the AI Summary author must match the maui-bot / github-actions login regex; any `type == 'Bot'` app no longer counts. - Extend Pester tests in Invoke-RerunReviewTrigger.Tests.ps1 and Resolve-RerunEligibility.Tests.ps1 to cover the new bindings, pipeline_ref normalization edge cases, error-path sanitization, and the narrower AI-Summary bot allowlist. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Code Review — recent Shane delta on PR #35685Reviewed commit range
Independent assessmentWhat this changes: The recent commits harden the rerun scanner handoff between AI output and deterministic safe-output execution. The safe-output handler now cross-checks AI-emitted decisions against the deterministic candidate artifact, sources operational values like platform/ref/comment id from that artifact instead of the AI payload, sanitizes log values, tightens PR-number/head validation, filters trusted Inferred motivation: The changes reduce prompt-injection and confused-deputy risk: the AI can still decide Findings
|
|
Follow-up on my prior review note about rate-limited reruns staying queued: after discussing the intended workflow, this behavior is acceptable and intentional. If the safe-output job blocks a trigger because of cooldown/quota, keeping No change requested for this point. |
…onsAllowed When AllowedAuthorLogins was supplied (the Query-RerunReadyPRs path), the helper trusted any comment whose login appeared in the allowlist and skipped the per-comment author_association check. That meant a login which had posted one valid /review comment while OWNER/MEMBER/COLLABORATOR could later post another /review comment after losing access and still have its --branch / --platform options honored. Make the per-comment author_association check unconditional so the AllowedAuthorLogins filter only further restricts the set rather than replacing the trust check. Add two regression tests covering the post-access-loss case (one mixed-history, one allowlisted login that no longer qualifies on any current comment). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Code Review — latest commit on PR #35685Reviewed latest commit Independent assessmentWhat this changes: Tightens Inferred motivation: Prevent a previously trusted login from carrying trust forward to a later comment after access changed, or from having stale author state in the precomputed allowed-login set override the current comment's own association. FindingsNo issues found in this commit. What looks good
Devil's advocateThe only subtle point is that Verdict: LGTMConfidence: high This is a small hardening commit with focused tests and no negative interaction found with the existing rerun candidate selection path. |
|
/review -b feature/enhanced-reviewer -p android |
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Summary
/review reruncommand implementation out of Enhance MauiBot AI summary review output #35677 and into this follow-up PR./review reruneligibility checks for new comments or commits and appliess/agent-ready-for-rerunwhen another review is justified.rerun-review-scanner.md+ compiled.lock.yml) that scans queued PRs, asks AI for atrigger/skipdecision, and uses a custom safe-output job for reactions, queue-label cleanup, and AzDO triggering.Validation
gh aw compile .github/workflows/rerun-review-scanner.mdSystem.Management.Automation.Language.Parser..github/workflows/rerun-review-scanner.lock.ymland.github/workflows/review-trigger.ymlas YAML.Invoke-Pester .github/scripts/Resolve-RerunEligibility.Tests.ps1,.github/scripts/Post-AISummaryComment.Tests.ps1,.github/scripts/Remove-StaleMauiBotComments.Tests.ps1 -CIQuery-RerunReadyPRs.ps1 -MaxPRs 1 -OutputPath /tmp/rerun-candidates.jsonDepends on
/review reruninstruction text.